Efficient data IO for a Parallel Global Cloud Resolving Model
نویسندگان
چکیده
Execution of a Global Cloud Resolving Model (GCRM) at target resolutions of 2e4 km will generate, at a minimum, 10s of Gigabytes of data per variable per snapshot. Writing this data to disk, without creating a serious bottleneck in the execution of the GCRM code, while also supporting efficient post-execution data analysis is a significant challenge. This paper discusses an Input/Output (IO) application programmer interface (API) for the GCRM that efficiently moves data from the model to disk while maintaining support for community standard formats, avoiding the creation of very large numbers of files, and supporting efficient analysis. Several aspects of the API will be discussed in detail. First, we discuss the output data layout which linearizes the data in a consistent way that is independent of the number of processors used to run the simulation and provides a convenient format for subsequent analyses of the data. Second, we discuss the flexible API interface that enables modelers to easily add variables to the output stream by specifying where in the GCRM code these variables are located and to flexibly configure the choice of outputs and distribution of data across files. The flexibility of the API is designed to allow model developers to add new data fields to the output as the model develops and new physics is added. It also provides a mechanism for allowing users of the GCRM code to adjust the output frequency and the number of fields written depending on the needs of individual calculations. Third, we describe the mapping to the NetCDF data model with an emphasis on the grid description. Fourth, we describe our messaging algorithms and IO aggregation strategies that are used to achieve high bandwidth while simultaneously writing concurrently from many processors to shared files. We conclude with initial performance results. Published by Elsevier Ltd.
منابع مشابه
An Efficient Resource Allocation for Processing Healthcare Data in the Cloud Computing Environment
Nowadays, processing large-media healthcare data in the cloud has become an effective way of satisfying the medical userschr('39') QoS (quality of service) demands. Providing healthcare for the community is a complex activity that relies heavily on information processing. Such processing can be very costly for organizations. However, processing healthcare data in cloud has become an effective s...
متن کاملCloud‐system resolving simulations with the NASA Goddard Earth Observing System global atmospheric model (GEOS‐5)
[1] The NASA Global Modeling and Assimilation Office (GMAO) has developed a global non‐hydrostatic cloud‐ system resolving capability within the NASA Goddard Earth Observing System global atmospheric model version 5 (GEOS‐5). Using a non‐hydrostatic finite‐volume dynamical core coupled with advances in the moist physics and convective parameterization the model has been used to perform cloud‐sy...
متن کاملCloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملGreen Energy-aware task scheduling using the DVFS technique in Cloud Computing
Nowdays, energy consumption as a critical issue in distributed computing systems with high performance has become so green computing tries to energy consumption, carbon footprint and CO2 emissions in high performance computing systems (HPCs) such as clusters, Grid and Cloud that a large number of parallel. Reducing energy consumption for high end computing can bring various benefits such as red...
متن کاملHardware Automation of Scheduler, Placer, Inter-Task Communications and IO System Functions for Manycore Processors Dynamically Shared among Multiple Applications
To enable maximizing on-time processing throughput across multiple internally pipelined/parallelized applications on dynamically shared manycore processors by eliminating system software overhead, a hardware automated implementation of the parallel execution system functions is presented. In the presented implementation scenario, the manycore processor hardware provides, besides the processing ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Environmental Modelling and Software
دوره 26 شماره
صفحات -
تاریخ انتشار 2011